Statistical Machine Translation for Query Expansion in Answer Retrieval
نویسندگان
چکیده
We present an approach to query expansion in answer retrieval that uses Statistical Machine Translation (SMT) techniques to bridge the lexical gap between questions and answers. SMT-based query expansion is done by i) using a full-sentence paraphraser to introduce synonyms in context of the entire query, and ii) by translating query terms into answer terms using a full-sentence SMT model trained on question-answer pairs. We evaluate these global, context-aware query expansion techniques on tfidf retrieval from 10 million question-answer pairs extracted from FAQ pages. Experimental results show that SMTbased expansion improves retrieval performance over local expansion and over retrieval without expansion.
منابع مشابه
An Improvement in Cross-Language Document Retrieval Based on Statistical Models
This paper presents a proposed method integrated with three statistical models including Translation model, Query generation model and Document retrieval model for cross-language document retrieval. Given a certain document in the source language, it will be translated into the target language of statistical machine translation model. The query generation model then selects the most relevant wo...
متن کاملCombining lexical and statistical translation evidence for cross-language information retrieval
This paper explores how best to use lexical and statistical translation evidence together for CrossLanguage Information Retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the ...
متن کاملCross-Language Retrieval Using HAIRCUT for CLEF 2004
JHU/APL continued to explore the use of knowledge-light methods for scalable multilingual retrieval during the CLEF 2004 evaluation. We relied on the language-neutral techniques of character n-gram tokenization, pre-translation query expansion, statistical translation using aligned parallel corpora, fusion from disparate retrievals, and reliance on language similarity when resources are scarce....
متن کاملQEA: A New Systematic and Comprehensive Classification of Query Expansion Approaches
A major problem in information retrieval is the difficulty to define the information needs of user and on the other hand, when user offers your query there is a vast amount of information to retrieval. Different methods , therefore, have been suggested for query expansion which concerned with reconfiguring of query by increasing efficiency and improving the criterion accuracy in the information...
متن کاملDCU's Experiments for the NTCIR-8 IR4QA Task
We describe DCU’s participation in the NTCIR-8 IR4QA task [16]. This task is a cross-language information retrieval (CLIR) task from English to Simplified Chinese which seeks to provide relevant documents for later cross language question answering (CLQA) tasks. For the IR4QA task, we submitted 5 official runs including two monolingual runs and three CLIR runs. For the monolingual retrieval we ...
متن کامل